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Abstract 

Background: Bacteriophages encode endolysins to lyse their host cell and allow escape of their progeny. Endolysins 
are also active against Gram-positive bacteria when applied from the outside and are thus attractive anti-bacterial 
agents. LysK, an endolysin from staphylococcal phage K, contains an N-terminal cysteine-histidine dependent 
amido-hydrolase/peptidase domain (CHAP K ), a central amidase domain and a C-terminal SH3b cell wall-binding 
domain. CHAP K cleaves bacterial peptidoglycan between the tetra-peptide stem and the penta-glycine bridge. 

Methods: The CHAP K domain of LysK was crystallized and high-resolution diffraction data was collected both from 
a native protein crystal and a methylmercury chloride derivatized crystal. The anomalous signal contained in the 
derivative data allowed the location of heavy atom sites and phase determination. The resulting structures were 
completed, refined and analyzed. The presence of calcium and zinc ions in the structure was confirmed by X-ray 
fluorescence emission spectroscopy. Zymogram analysis was performed on the enzyme and selected site-directed 
mutants. 

Results: The structure of CHAP K revealed a papain-like topology with a hydrophobic cleft, where the catalytic 
triad is located. Ordered buffer molecules present in this groove may mimic the peptidoglycan substrate. When 
compared to previously solved CHAP domains, CHAP K contains an additional lobe in its N-terminal domain, with a 
structural calcium ion, coordinated by residues Asp45, Asp47, Tyr49, His51 and Asp56. The presence of a zinc ion in 
the active site was also apparent, coordinated by the catalytic residue Cys54 and a possible substrate analogue. 
Site-directed mutagenesis was used to demonstrate that residues involved in calcium binding and of the proposed 
active site were important for enzyme activity. 

Conclusions: The high-resolution structure of the CHAP K domain of LysK was determined, suggesting the location of 
the active site, the substrate-binding groove and revealing the presence of a structurally important calcium ion. A zinc 
ion was found more loosely bound. Based on the structure, we propose a possible reaction mechanism. Future studies 
will be aimed at co-crystallizing CHAP K with substrate analogues and elucidating its role in the complete LysK protein. 
This, in turn, may lead to the design of site-directed mutants with altered activity or substrate specificity. 
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Spanish abstract 

Introduction: Los bacteriofagos codifican endolisinas para lisar sus bacterias hospedadoras y permitir la liberacion 
de su progenie. Las endolisinas tambien son activas contra bacterias Gram positivas cuando se aplican desde el 
exterior, y por lo tanto, son consideradas agentes antibacterianos atractivos. LysK, una endolisina del fago K que 
infecta estafilococos, contiene un dominio N-terminal amidohidrolasa/peptidasa dependiente de cisteina e histidina 
(CHAP K ), un dominio amidasa central y un dominio C-terminal SH3b de union a la pared bacteriana. CHAP K corta el 
peptidoglicano bacteriano entre el tetrapeptido y los puentes pentaglicina. 

Metodos: El dominio CHAP K de LysK fue cristalizado y se obtuvieron datos de difraccion a alta resolucion tanto de 
un cristal de proteina nativo como de un cristal derivado con cloruro de metilmercurio. La serial anomala presente 
en los datos derivados permitio la localizacion de la posicion de los atomos pesados y la determinacion de la fase. 
Las estructuras resultantes se completaron, refinaron y analizaron. La presencia de iones de calcio y zinc en la 
estructura fue confirmada por espectroscopia de emision de fluorescencia de rayos X. Se llevaron a cabo analisis de 
zimograma sobre la enzima nativa y sobre mutantes puntuales seleccionados. 

Resultados: La estructura de CHAP K revelo una topologia tipo papaina con un bolsillo hidrofobico donde se 
localiza la triada catalitica. Moleculas de tampon ordenadas presentes en este hueco pueden mimetizar el 
substrato de peptidoglicano. Cuando se compara con dominios CHAP resueltos previamente, CHAP K contiene un 
lobulo adicional en su dominio N-terminal, con un ion de calcio estructural, coordinado por los residuos Asp56, 
Asp45, Asp47. Tambien se observa la presencia de un ion de zinc en el centro activo, coordinado con el residuo 
catalftico Cys54 y un posible analogo del substrato. Se uso mutagenesis dirigida para demostrar que los residuos 
involucrados en la union a calcio y los presentes en el centro activo propuesto eran importantes para la actividad 
enzimatica. 

Conclusiones: Se determino la estructura del dominio CHAP K de LysK a alta resolucion, sugiriendo la localizacion 
del centro activo, del bolsillo de union al sustrato y revelando la presencia de un ion de calcio estructuralmente 
importante. Se encontro un ion de zinc unido mas debilmente. Basandonos en la estructura, proponemos un 
posible mecanismo de reaccion. Futuros estudios tendran por objeto la cristalizacion de CHAP K con analogos del 
sustrato y la elucidacion de su papel en la proteina LysK completa. Esto, a su vez, podrfa conducir al diseno de 
mutantes puntuales con una actividad o especificidad de sustrato modificada. 



Background 

Bacteriophage K is a virulent phage that infects a wide 
range of staphylococci. It belongs to the Myoviridae 
family of the Caudovirales order, with a genome of 
148,317 bp [1-3]. To allow its progeny to escape from 
the host cell ("lysis from within"), it encodes the endoly- 
sin LysK, a peptidoglycan hydrolase [4]. When applied 
exogenously to the pathogen, LysK causes "lysis from 
without" or exolysis [5]. Gram-positive endolysins are 
highly specific [4], and no bacterial variants resistant to 
their phage endolysins have been found despite the use 
of mutagenesis strategies to promote the chance of 
resistance development [6]. LysK kills a wide range of 
staphylococci, including multi-drug-resistant Staphylo- 
coccus aureus (MRSA) [7]. 

LysK contains three domains: an N-terminal cysteine- 
histidine dependent amido-hydrolase/peptidase (CHAP) 
domain, a central amidase domain and a C-terminal 
SH3b cell wall-binding domain. The LysK amidase do- 
main cleaves peptidoglycan between N-acetylmuramic 
acid and L-alanine of the stem peptide, while the 
CHAP domain hydrolyzes it between the D-alanine 
of the tetra-peptide stem and the first glycine of the 



penta-glycine cross-bridge [8]. A truncated enzyme 
called CHAP K , containing only the first 165 amino 
acids of LysK corresponding to the CHAP domain, also 
showed exolytic activity [9] . CHAP K is able to lyse sev- 
eral staphyloccocal species, independently from their 
origin, their antibiotic resistance profile and their ability to 
produce exopolysaccharides (associated with biofilm for- 
mation) [10,11]. It is also effective against other related 
genera, such as Micrococcus or Streptococcus [7]. 

In order to understand the reaction mechanism and 
perhaps improve or alter the activity, we set out to 
solve the structure of CHAP K . The CHAP K domain 
was expressed in Escherichia coli, purified and crystal- 
lized. Although the crystallization procedure was not 
very reproducible and crystals grew as inter-grown 
plates, a high-resolution dataset could be collected 
from one of them, plus a dataset from a methylmercury 
chloride derivative of sufficient quality for structure 
solution by single-wavelength anomalous dispersion 
[12]. This structure was refined against both the native 
and the derivative dataset. Here we present the high- 
resolution structure of the CHAP K domain solved by 
X-ray crystallography. 
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Results and discussion 

Overall structure 

The final models of the CHAP K enzyme contain amino 
acids 2-165 for each of the four protein molecules 
present in the crystallographic asymmetric units, with 
good crystallographic statistics and reasonable protein 
geometry (Table 1). The models also contain metal ions, 
waters and other solvent molecules. For the native struc- 
ture, a calcium ion, a zinc ion and a 2-(N-morpholino) 
ethanesulfonic acid (MES) molecule have been modelled 
associated with each of the protein chains, as discussed 
below. Other ordered solvent molecules have also been 
modelled in the asymmetric unit and consist of one 
glycerol molecule, four putative sodium ions and 741 
water molecules. For the derivative structure, a calcium 
ion and a 2-[4-(2-hydroxyethyl)piperazin-l-yl] ethane- 
sulfonic acid (HEPES) molecule have been modelled 
associated with each of the protein chains, while Cys54 
is modelled as methylmercury-cysteine. In this case, 
ordered solvent molecules modelled in the unit cell 
include two glycerol molecules, ten additional putative 



Table 1 Refinement and validation statistics for the 
CHAP K structure 



Native 



Derivative 



PDB code 

Space group 

Cell edges (a, b, c, A) 

Cell angles (a, (3, v, °) 

Resolution range used (A) 

Multiplicity 

Completeness 

Mean <l/sigma(i)> 

Rsym (%) b 

Number of reflections used 

Number of reflections used 
for R-free 

R-factor c 

R-free 

Number of atoms 
(protein/water/other) 

Average B-value/Wilson 
B-value (A 2 ) 

Ramachandran statistics 11 (%) 

R.m.s. deviations 5 
(bonds, A/angles, °) 



4CSH 
P1 

39.2, 61.5, 73.2 
91.5, 98.7, 90.1 



4CT3 
PI 

39.0, 61.5, 72.8 
91.8, 98.7, 90.0 



32.9-1 .79 (1 .88-1 .79) a 61 .5-1 .69 (1 .78-1 .69) 



2.0 (1.9) 
97.2 (94.3) 
6.3 (2.9) 

9.1 (25.1) 
59686 (8628) 
2338 (112) 

0.175 (0.233) 
0.201 (0.282) 
5286/741/66 

18.4/14.5 

97.7/1 00.0 
0.015/1.49 



3.4 (3.2) 

64.7 (10.4) 

11.8 (1.8) 
6.0 (62.0) 
46067 (1349) 
2431 (62) 

0.181 (0.278) 
0.224 (0.295) 
5259/770/1 04 

22.2/14.9 

98.2/100.0 
0.012/1.37 



a Values in parentheses are for the highest resolution bin, where applicable. 
b Rsym-^h^i|lhi-<lh>|/^h^i|lhil- where l hi is the intensity of the /th measurement of 
the same reflection and <l h > is the mean observed intensity for that reflection. 
c R=l\\Fobs(hkl)\-\Fcalc(hkl)\\/l\Fobs<hkl)\. 

d Determined with MOLPROBITY. The percentages are indicated of residues in 
favoured and allowed regions of the Ramachandran plot, respectively. 
e Provided by REFMAC. 



methylmercury ions, two putative chloride ions and 
770 waters. Despite the lower nominal resolution of the 
native dataset when compared with the derivative (1.8 
vs. 1.7 A), the general structural analyses described 
below are done using the structure refined against the 
native dataset, as that dataset is more complete (97.2 
vs. 64.7%), contains more measured reflections (62028 
vs. 48498) [10], and better maps with less non- 
interpretable noise peaks were obtained. 

The four CHAP K monomers do not form extensive 
inter-monomer interfaces in the crystal, suggesting that 
in solution the protein is monomeric. When the four 
crystallographically independent monomers are com- 
pared with each other, it is observed that they are very 
similar. While in part this is due to the use of local 
non-crystallographic symmetry restraints in the refine- 
ment, the fact that including these restraints signifi- 
cantly improved correspondence of the model to the 
data supports the similarity of the four crystallograph- 
ically independent protein chains. Chains A and B on 
one hand, and chains C and D on the other, can be 
most reliably superposed, with root mean square differ- 
ences (r.m.s.d.) between C-alpha atoms of 0.07 and 
0.05 A, respectively. The r.m.s.d. between chains A or B 
on one hand and chains C or D on the other are 0.23- 
0.26 A. The largest structural differences are concen- 
trated in residues 29-39 and 136-143, part of surface 
loops that interact with each other. These differences 
between the monomers are likely caused by interaction 
with neighbouring monomers in the crystal, i.e. differ- 
ent crystal contacts. The loop consisting of residues 
136 to 143 is right next to a putative substrate-binding 
groove, so it may be somewhat more flexible to allow 
access of the substrate and release of the cleavage 
products. 

The CHAP K protein consists of a single globular do- 
main that contains two alpha-helices, two 3io-helices 
and six beta-strands (Figure 1A and B). The amino- 
terminal part of the protein consists of the two alpha- 
helices (I and II) interconnected by a long loop. This 
long loop borders a groove in the protein, at the bot- 
tom of which the catalytic site is located (see below). 
Another loop, containing a 3 10 -helix, connects this 
amino-terminal part of the protein to a six-stranded 
beta-sheet that forms the carboxy-terminal part. The 
six beta-strands are arranged in an anti-parallel beta- 
sheet in the topology AFBCDE (Figure IB). The struc- 
ture of CHAP K had previously been predicted by in 
silico modelling [13]. The six-stranded beta-sheet was 
predicted well, but the amino-terminal alpha-helices 
were incorrectly placed and the calcium-binding loop 
between them was not present in the model. The main 
chain atoms of the catalytic site residues were within 
2 A of their predicted positions. 
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165 



Figure 1 Crystal structure of the N-terminal cysteine-histidine dependent amido-hydrolase/peptidase domain (CHAP K ) of the endolysin 
LysK from staphylococcal bacteriophage K. (A) Overall structure. Beta-strands are shown in green, alpha-helices in blue and 3 10 -helices in red. 
The calcium ion is shown in grey, the zinc ion in white. The N-terminal end (Nt), residue 165, the alpha-helices and the beta-strands are labelled. 
(B). Topology diagram. The same labelling is used as in panel A. (C). Superposition of CHAP K (magenta) onto structure onto the CHAP domain of 
the streptococcal phage endolysin PlyC (PDB entry 4 F88; cyan). (D). Space-filling representation with conserved residues in almost the same 
orientation as panel A, but slightly tilted forward to better illustrate the hydrophobic groove, which is indicated with an arrow. The colour coding 
goes from blue for less conserved residues, via white, to purple for the most conserved residues. 



When the structure is analyzed, it is clear that 
CHAP K belongs to the cysteine protease CA peptidase 
clan Pfam: CL0125; http://pfam.xfam.org/; Ref. [14], 
with a papain-like fold. CHAP K is a member of the 
CHAP family of this clan (Pfam: PF05257), as expected 
from sequence homology. A structural similarity search 
revealed that the most similar structure is the CHAP 
domain of the streptococcal phage endolysin PlyC 
(PDB entry 4F88) [15], with a root mean square differ- 
ence (r.m.s.d.) of 2.5 A when the backbone atoms of 
124 residues are superposed onto CHAP K (Z-score 
11.4). The next most similar structure is the C-terminal 
endopeptidase domain of the NlpC/P60 family cell-wall 
remodelling protein Bacillus cereus PDB code 3H41; 
Ref. [16], with an r.m.s.d. of 2.8 A when the backbone 
atoms of 114 residues are superposed (Z-score 10.2). 



When the PDB database is searched for sequence- 
similar structures, the first hit is the CHAP domain 
from Staphylococcus saprophyticus CHAP domain pro- 
tein (PDB entry 2K3A) [17], with a sequence identity of 
28% over a stretch of 94 residues. However, this struc- 
ture cannot be superimposed as well as those previ- 
ously mentioned (r.m.s.d. of 3.4 A when backbone 
atoms of 101 residues are superposed, Z-score 6.3) and 
our attempts to solve the CHAP K structure by molecu- 
lar replacement using this model were unsuccessful. 
This lower similarity may be due to the fact that this 
structure was determined by NMR spectroscopy rather 
than crystallography. Superposition of the CHAP K 
structure onto the CHAP domain of the streptococcal 
phage endolysin PlyC (PDB entry 4F88) is shown in 
Figure 1C. The two alpha-helices and six beta-strands 
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of CHAP K superpose quite well with the backbone of 
the homologous structures, but the loops, including the 
3i 0 -helices, are very different. 

The globular CHAP K protein has a relatively long and 
deep hydrophobic groove. When sequence conservation 
is mapped onto the surface, one notices that several resi- 
dues lining the groove are highly conserved (Figure ID; 
the sequence alignment underlying this figure is in 
Additional file 1: Table SI). In the native structure, a 
MES molecule is located in this groove (Figure 2, PDB 
entry 4CSH), while in the derivative structure a HEPES 
molecule is present (PDB entry 4CT3). These molecules 
may well be mimicking the natural peptidoglycan sub- 
strate of the protein. Residues in the groove that might 
contact the peptidoglycan substrate are: Phe36, Asp47, 
Tyr49, Tyr50, Gln53 and Cys54 from the loop between 
helices 1 and 2; Asp56 and Thr59 from helix 2; Arg71, 
Trp73 and Asn75 from the loop between helix 2 and 
beta-strand A; Trpll5 and Hisll7 from the BC-loop 
and Asnl36 and Trpl37 from the DE-loop. 

Bound metal ions 

While building and refining the protein model, relatively 
strong density peaks were observed near the terminal 
atoms of the side-chains of Cys54 and Asp56 in each of 
the four protein chains in the asymmetric unit, suggest- 
ing the presence of metal ions. X-ray fluorescence spec- 
troscopy is a powerful method to identify trace elements 
in biological samples [18]. Therefore, we recorded an X- 
ray fluorescence spectrum from a frozen native CHAP K 
protein crystal, which revealed significant amounts of 




Figure 2 MES buffer molecule bound to the CHAP K enzyme 

putative substrate binding site. The CHAP K protein is shown in 

transparent surface and secondary structure cartoon representation; 

the calcium ion is also shown. 
* 



zinc and calcium (Figure 3A). Sulphur (from methionine, 
cysteine residues and buffer molecules) and chlorine 
(from the crystallization buffer) were also detected. The 
presence of trace amounts of titanium and copper is 
likely the result of interaction of the beam with certain 
beamline or sample holder components not related to 
the sample. 

The calcium ion is bound in the amino-terminal part 
of the protein, involving residues of the long loop 
connecting the first and second alpha-helices (residues 
17-54) and Asp56 in the second alpha-helix. It is 
bound in a monodentate way to the side chain of resi- 
dues Asp45 and Asp47 and in a bidentate way to both 
oxygen atoms of the Asp56 side chain (Figure 3B). 
Additional ligands are the main chain oxygen atoms 
of Tyr49 and His51 and an ordered water molecule. 
The coordination is octahedral and almost exclusively 
involves carbonyl oxygen atoms, as expected for calcium. 
Experimentally determined metal ion-oxygen distances 
are 2.3-2.5 A, which is also consistent with usual calcium 
(II) coordination [19]. The occupancy of the calcium 
site appears to be complete and the refined temperature 
factors of the calcium ions are very near those of the 
coordinating atoms (the temperature factors for the 
calcium ions vary between 10 and 12 A 2 , while those for 
the coordinating ligand atoms are between 7 and 
14 A ). The calcium ion is near the proposed catalytic 
site (Figure 2). We propose that the calcium ion plays a 
structural role, helping to maintain the structure of the 
amino-terminal domain and thus its catalytic residues 
in the correct relative orientation. The calcium ion 
binding loop also contains residues that may be in 
contact with the substrate and thus play a role in deter- 
mining substrate specificity. In the derivative protein 
structure, the calcium is present at the same occupancy 
and with the same coordinating ligands. 

In contrast to the tightly bound calcium ion, the zinc 
ions appear to be bound more loosely and the derivative 
structure shows they could be replaced by methylmer- 
cury ions upon soaking of the crystals with methylmer- 
cury chloride. Also, the occupancy appears to be less 
than unity, we estimate it to be around 0.67 based on 
refinement runs performed at different occupancies. 
Finally, the resulting electron density around the zinc 
ions is somewhat ambiguous and we could not model 
the ligands without some remaining uncertainty. The 
zinc ions are coordinated by the sulphydryl group of 
Cys54, the sulphate group of the bound MES and several 
water molecules (Figure 3C). It is also near the main 
chain oxygen atom of Glyll6. The coordination dis- 
tances for the zinc ion are not ideal; the zinc ion is too 
close to Cys54 and too far from the coordinating oxygen 
atoms. A report by another group showed that zinc ions 
inhibit the LysK enzyme, while calcium ions have no 
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Figure 3 Presence of metal ions in the CHAP K crystal structure. A. X-ray fluorescence emission spectrum collected from a CHAP K crystal 
irradiated with monochromatic synchrotron radiation (12.7 KeV). B. Detail of the calcium ion coordination. Coordinating atoms are one 05 atom 
of each of Asp45 and Asp47 residues, both 05 atoms of Asp56, the main chain oxygen atoms ofTyr49 and His51 and an ordered water molecule 
(behind the calcium ion in this view). C. Detail of the zinc coordination. The zinc ion is sandwiched between Cys54 and the sulphate group of 
the MES ion, about 1 0 A away from the calcium ion. 



effect on activity, but significantly enhance stability of 
the enzyme [20]. However, in this assay, metal ions were 
not removed from the protein solution prior to testing 
their effects on the enzyme. Zinc ions may play a regula- 
tory role, and their binding near Cys54 suggests they 
may regulate access of the substrate to the catalytic site. 

The importance of the calcium ion in relation to the 
catalytic ability of CHAP K was investigated by creation 
of mutants containing a single amino acid change to 
alanine at each of the five residues involved in calcium 
coordination. Zymogram analysis demonstrated that 
mutation of residues Asp45, Asp47 and Asp56 resulted 
in the complete abolishment of the staphylolytic activ- 
ity of the enzyme (Figure 4). This result indicates that 
the coordinated calcium ion is essential for the catalytic 
mechanism of the enzyme and complements a previous 
study, which showed that the chelator EDTA was able 
to reduce CHAP K activity by 99% [21]. While mutant 
His51-Ala retained staphylolytic ability, activity of the 
enzyme was visibly reduced in comparison with the 
parental CHAP K . Mutation of Tyr49 to alanine did not 
appear to affect the staphylolytic ability of the enzyme 
as the clearing produced on a zymogram gel was compar- 
able to that seen for non-mutated CHAP K (Figure 4). The 
fact that mutants His51-Ala and Tyr49-Ala retained 



activity while the other mutants did not may be ex- 
plained by the fact that main chain oxygen atoms are 
involved in coordination as opposed to the side chain 
oxygens. Therefore these residues are more amenable 
to substitution without eliminating catalytic activity. 

Catalytic centre and proposed reaction mechanism 

By comparing the CHAP K protein with other proteins 
with a similar function and structure (endolysins, CHAP 
domains and others) and by doing an alignment be- 
tween them, we can deduce that the catalytic residues 
are highly conserved. In the CHAP domain of Staphylo- 
coccus saprophyticus (PDB code 2K3A), the authors 
describe the presence of a proteolytic triad formed by 
Cys57, Hisl09 and Glul26 [17], a catalytic triad also 
found in other members of the CA clan. In the streptococ- 
cal phage lysin PlyC (PDB code 4 F88), the catalytic resi- 
dues are Cys333 and His420 [15], while in NlpC/P60 
domain of lipoprotein SPR from E. coli (PDB code 3H41) 
the catalytic residues are Cys68, Hisll9 and His339 [22]. 
In CHAPk these residues correspond in the alignment to 
Cys54 located in the second alpha-helix, Hisll7 in beta- 
strand C and Glul34 in beta-strand D, making these 
amino acids good candidates to form the catalytic triad of 
the enzyme (Figure 5). These hypothetical catalytic 
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kDa D47A H51A C54A E134A CHAP K Neg. D 4 7A H51A C54A E134A CHAP K Neg. 

D45A Y49A D56A H117A N136A Control D45A Y49A D56A H117A N136A Control 




Figure 4 Overexpression and activity of CHAP K mutants. A. Sodium dodecyl sulphate polyacryalamide electrophoresis gel of lysates 
containing over-expressed CHAP K and site-directed mutants. A control not expressing CHAP K is also included. B. Composite zymogram gel of 
CHAP K , site-directed mutant CHAP K variants and negative control expression lysates. 



residues are close to the hydrophobic cleft, which supports 
the possibility that the catalytic part of the molecule is lo- 
cated in the hydrophobic groove. The predicted pKa of 
Hisll7 is 9.3. This value contrasts with those of the rest of 
histidines in the protein: His51 (pKa 5.4), His91 (pKa 6.8) 
and His 157 (pKa 5.2). Hisll7 may thus be protonated at 
physiological pH. 

Mutation of the conserved Cys54 and Hisll7 residues 
to alanine resulted in complete elimination of staphylolytic 
activity of the enzyme as demonstrated by zymographic 
analysis, indicating an essential role of these residues and 
supporting the hypothesis that they are part of the cata- 
lytic triad. Glul34 is believed to be the other residue of 
the catalytic triad, but is not as highly conserved as the 




Figure 5 The proposed catalytic triad of the bacteriophage K 
endolysin CHAP domain CHAP K . Cys54 (bottom), Hisl 1 7 (middle) 
and Glul 34 (top) and the distances between them (in A) are shown. 



other two residues. When this residue was mutated to ala- 
nine, it was clear from zymogram results that, although 
the catalytic activity was not completely eliminated, it was 
strongly reduced. In the absence of Glul34 perhaps an- 
other residue can take over its role. 

A likely mechanism of action, analogous to that of 
other papain proteases [23,24], is the following: Glul34 
accepts a proton from the protonated imidazole group 
of Hisl 17. Hisl 17 subsequently accepts a proton from 
the hydroxyl group of Cys54 (through its N-epsilon). 
The deprotonated Cys54 then performs a nucleophilic 
attack on the peptidic bond between D-Ala and Gly in 
the staphylococcal peptidoglycan. As a result, a transa- 
cylation reaction between the enzyme and substrate 
occurs, giving rise to an acyl-enzyme intermediate. 
This intermediate may be hydrolyzed to release the en- 
zyme and the cleaved peptidoglycan [25]. In the NlpC/ 
P60 domain of lipoprotein SPR from E. coli, there is a 
tyrosine residue (Tyr56) that has been reported to be 
very conserved and which may modulate Cys nucleophi- 
licity or help in substrate binding [22]. In the case of 
CHAP K , Tyrl40 is located in an equivalent position, but 
having a different role, since its phenol group is pointing 
in the opposite direction. Cysteine proteases have an oxy- 
anion hole, which helps to stabilize the developing nega- 
tive charge during the formation of the acylenzyme 
intermediate [26]. Asnl36, which is located in close prox- 
imity to the catalytic triad, is one residue hypothesized 
to be involved in creating the oxyanion hole. When 
this residue was mutated to an alanine, the activity of 
the enzyme was visibly reduced, but not completely 
eliminated, supporting the aforementioned hypothesis. 

Comparison with LysGH15 CHAP domain structure 

While this manuscript was under review, a paper de- 
scribing the structures of the CHAP domain (PDB entry 
4 OLK), amidase-2 domain (PDB entry 4QLS) and the 
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SH3 domain (PDB entry 2MK5) of the endolysin 
LysGH15 from phage GH15 was published [27]. The 
first two were solved by X-ray crystallography at 2.7 
and 2.2 A resolution respectively, while the latter was 
solved by NMR spectroscopy. Phages GH15 and K 
share 97% identity in 84% of their genomes (Genbank 
entries NC_019448 and NC_005880, respectively) 
[2,28]. The LysGH15 and LysK protein sequences are 
virtually identical, with only four amino acid differ- 
ences in their 495-residue sequences. Of the differ- 
ences, two are in the CHAP domain: Val26 of CHAP K 
is an isoleucine in CHAP GH15 and Glull3 of CHAP K 
is a glutamine in CHAP GH i5- The high sequence simi- 
larity means the enzymes are almost identical and 
expected to share the same properties. 

When the crystal structures of the CHAP domains are 
compared, it is notable the spacegroups and crystal 
packing are very different, which suggests the protein is 
a monomer in solution and inter-monomer interactions 
in the crystal are not likely to be biologically relevant. 
Given the almost identical sequences, it is not surprising 
that the monomer structures are highly similar; super- 
position of the two CHAP domains leads to an r.m.s.d. 
of 0.3 A when 139 C-alpha atoms are superposed. The 
only significant difference in main-chain conformation is 
present in residues 109-116, which follow a different 
path in the two structures. This may indicate that this 
loop, which is directed away from the active site, is flex- 
ible and of limited importance to the structure and 
activity of the enzyme. The large side-chains of Tyr49, 
Trp73, Tyrl40 and Tyrl53, which are all on the surface 
of the protein, show different orientations. 

The higher resolution of the CHAP K structure when 
compared to the CHAP GH is structure (1.8 vs. 2.7 A) 
should have led to more accurate placement of side- 
chain atoms and solvent molecules. In both structures, a 
buffer molecule occupies the groove that likely accom- 
modates the peptidoglycan substrate: a Bis-Tris molecule 
(2-[Bis(2-hydroxyethyl)amino]-2-(hydroxymethyl)-l,3-pro- 
panediol) in between the two monomers of the asymmet- 
ric unit of CHAP GH i5 and a MES and HEPES molecule in 
the case of the native and derivative structures of CHAP K , 
respectively. The calcium ion is in exactly the same pos- 
ition, as are its coordinating residues and the EF-hand-like 
domain in which it is incorporated. No zinc ion was ob- 
served in the CHAP GH15 crystals. 

Gu et al. also performed site-directed mutagenesis stud- 
ies [27], but on the intact LysGH15 enzyme, not on the 
isolated CHAP GH15 domain. As observed for CHAP K , it 
was found that mutating the active site residue Cys54 
affected bacterial lysis activity strongly. Mutating the 
calcium ion coordinating residues Asp45, Asp46 and 
Asp56 also diminished activity about ten-fold, while Tyr49 
and His51 seem less important, the same as we observed. 



Conclusions 

We determined the structure of the CHAP K domain of 
LysK at 1.8 A resolution (1 A = 0.1 nm). The structure 
has the papain-type fold with a long loop between the 
two amino-terminal alpha-helices. The structure sug- 
gests the location of the active site near a hydrophobic 
groove, with Cys54, Hisll7 and Glul34 forming the 
catalytic triad. The substrate most likely binds to the 
hydrophobic groove. 

A calcium ion was found tightly bound to the protein. 
Its ligands are the side-chains of Asp45, Asp47 and 
Asp56, plus the backbone oxygens of Tyr49 and His51, 
all in the amino-terminal domain specific to CHAP K . It 
likely has a structural role, stabilizing the protein fold. It 
may also be involved in ensuring the correct location of 
the peptidoglycan inside the catalytic cleft or in the 
stabilization of the negative charge of the tetrahedral 
intermediate during catalysis. A zinc ion was also found 
and is likely more loosely bound, as it is less buried, has 
less protein ligands and could be exchanged for a meth- 
ylmercury ion upon derivatization. Its role, if any, may 
be regulatory. 

Based on the structure, we propose a possible reaction 
mechanism, involving all three residues of the likely cata- 
lytic triad. Future studies will include co-crystallization 
with peptidoglycan analogues and elucidating the role of 
the CHAP K domain in the complete LysK protein. This 
may allow site-directed mutation to modulate the pep- 
tidoglycan specificity and activity of both the CHAP K 
and LysK enzymes. 

Methods 

CHAP K was expressed, purified, crystallized and crystal- 
lographic data was collected as described [9,12]. A 
complete native dataset was collected to 1.8 A resolution 
with good statistics. A dataset to 1.7 A resolution, but 
with inferior completeness, was also collected from a 
methylmercury chloride derivative at the Hg L-I edge 
[12]. However, this dataset allowed phase determination 
by single anomalous dispersion (SAD) and automatic 
model building of four crystallographically independent 
protein molecules in the PI unit cell [12] (Table 1) using 
the ARP-WARP program [29]. The model was refined 
against the derivative dataset and separately against the 
native dataset. The models were completed and adjusted 
using COOT [30] and refined with REFMAC5, using 
local non-crystallographic symmetry restraints [31] and 
taking care to select the same reflections for calculation 
of Rfree [32]. To confirm the presence of zinc and cal- 
cium ions in the sample, an X-ray fluorescence emission 
spectrum was collected on a native protein crystal at 
ESRF beamline ID23-1 [33]. Validation was performed 
with MolProbity [34]. Refinement and validation statis- 
tics are shown in Table 1. 
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Crystal contact analysis was done with PISA [35]; other 
analyses were performed with the CCP4 suite [36]. Struc- 
tural similarity analysis was performed with DALI [37]; for 
plotting a protein surface coloured according to amino acid 
conservation, CONSURF was used [38]. The pKa of se- 
lected residues in the protein structure was predicted with 
PROPKA [39]. The structural models and underlying data 
files have been submitted to the PDB (accession code 
4CSH for the native structure and 4CT3 for the derivative). 
PYMOL (Schrodinger LLC, Portland OR, USA) was used 
for making structure figures and TOPDRAW [40] to draw 
the secondary structure diagram. 

CHAP K mutants were created using the QuikChange II 
Site-Directed Mutagenesis Kit from Agilent (Santa Clara 
CA, USA) as per the manufacturer's instructions. Crude 
cell lysate was analyzed for over-expression using sodium 
dodecyl sulphate gel electrophoresis and for ability to 
lyse Staphylococcus aureus cells using zymographic gels 
as described previously [41]. 

Additional file 



Additional file 1: Table SI. Sequence aligment underlining the colour 
coding of Figure 1 D. 
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